profile maximum likelihood
The Broad Optimality of Profile Maximum Likelihood
We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size $k$ and desired accuracy $\varepsilon$: \textbf{Distribution estimation} Under $\ell_1$ distance, PML yields optimal $\Theta(k/(\varepsilon^2\log k))$ sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; \textbf{Additive property estimation} For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; \textbf{$\alpha$-R\'enyi entropy estimation} For an integer $\alpha> 1$, the PML plug-in estimator has optimal $k^{1-1/\alpha}$ sample complexity; for non-integer $\alpha> 3/4$, the PML plug-in estimator has sample complexity lower than the state of the art; \textbf{Identity testing} In testing whether an unknown distribution is equal to or at least $\varepsilon$ far from a given distribution in $\ell_1$ distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of $k$. With minor modifications, most of these results also hold for a near-linear-time computable variant of PML.
Instance Based Approximations to Profile Maximum Likelihood
In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.
On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $\epsilon \gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $\epsilon \gg n^{-1/4}$ achievable by polynomial time computable PML-based universal estimators \cite{ACSS20, ACSS20b}. Our estimator reaches a theoretical limit for universal symmetric property estimation as \cite{Han20} shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every $1$-Lipschitz property when $\epsilon \ll n^{-1/3}$.
Reviews: The Broad Optimality of Profile Maximum Likelihood
The paper shows that profile maximum likelihood, an idea from the distribution estimation literature from a couple of years ago, enjoys optimality properties for a large class of property estimation tasks. The class of tasks includes a number of popular problems studied in the distribution learning literature. All reviewers liked the paper and advocate acceptance. Please do go over the reviews and incorporate any feedback for the camera ready.
Review for NeurIPS paper: Instance Based Approximations to Profile Maximum Likelihood
Summary and Contributions: Statistical property estimation is an important and active area at the intersection of theoretical computer science, statistics, and information theory. For example, a basic question in this realm: given n iid samples from an unknown discrete distribution p, how well can we estimate the entropy H(p), and what is an efficient algorithm for doing so? Recent efforts have shown that, for any symmetric property, the profile maximum likelihood estimator is universally minimax optimal for a wide range of parameters. While this at first seemed like a purely theoretical result, algorithmic efforts quickly caught up to show that 1) efficient approximation of the profile maximum likelihood estimator is possible and 2) approximate profile maximum likelihood estimation suffices for minimax optimality. In this context, this paper refines recent approximation algorithms from exp(-\sqrt{n} log n) to exp(-k log n) where k is the number of observed frequencies, with k O(\sqrt{n}).
Instance Based Approximations to Profile Maximum Likelihood
In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure.
The Broad Optimality of Profile Maximum Likelihood
We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size k and desired accuracy \varepsilon: \textbf{Distribution estimation} Under \ell_1 distance, PML yields optimal \Theta(k/(\varepsilon 2\log k)) sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; \textbf{Additive property estimation} For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; \textbf{ \alpha -R\'enyi entropy estimation} For an integer \alpha 1, the PML plug-in estimator has optimal k {1-1/\alpha} sample complexity; for non-integer \alpha 3/4, the PML plug-in estimator has sample complexity lower than the state of the art; \textbf{Identity testing} In testing whether an unknown distribution is equal to or at least \varepsilon far from a given distribution in \ell_1 distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of k . With minor modifications, most of these results also hold for a near-linear-time computable variant of PML.
On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given n independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error \epsilon \gg n {-1/3} . This result improves upon the previous best accuracy threshold of \epsilon \gg n {-1/4} achievable by polynomial time computable PML-based universal estimators \cite{ACSS20, ACSS20b}. Our estimator reaches a theoretical limit for universal symmetric property estimation as \cite{Han20} shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every 1 -Lipschitz property when \epsilon \ll n {-1/3} .